Goto

Collaborating Authors

 Tulsa County



MARL Warehouse Robots

Allman, Price, Thang, Lian, Simmons, Dre, Riaz, Salmon

arXiv.org Artificial Intelligence

Our research investigates the complex task of multiple autonomous agents learning to coordinate and deliver packages in warehouse environments--a problem requiring implicit communication, collision avoidance, and efficient task allocation without centralized control. Traditional warehouse automation relies on centralized planning systems that face scalability limitations; multi-agent reinforcement learning (MARL) offers an alternative through decentralized learned policies, but requires solving the credit assignment problem. We compare MARL algorithms on warehouse coordination: QMIX [Rashid et al., 2018] (value decomposition), IPPO (independent learning), and MASAC (centralized critic). Our study progresses from MPE for validation to RWARE for warehouse evaluation, culminating in Unity 3D deployment where agents demonstrate learned package delivery behavior. QMIX emerged as the best performer after systematic comparison. Our contributions: (1) hyperparameter analysis showing default configurations fail on sparse-reward warehouse tasks, (2) comparative evaluation across algorithms and scales, (3) Unity ML-Agents integration demonstrating sim-to-sim transfer with successful package delivery, and (4) identification of scaling challenges. Full experimental details and results are documented in our Quarto documentation book. 1


Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks

Shen, Huanming, Huang, Baizhou, Wan, Xiaojun

arXiv.org Artificial Intelligence

Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work breaks this trade-off by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a novel watermark scheme with Sub-vocabulary decomposed Equivalent tExture Key (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing. Experiments demonstrate SEEK's superiority over prior method, yielding spoofing robustness gains of +88.2%/+92.3%/+82.0% and scrubbing robustness gains of +10.2%/+6.4%/+24.6% across diverse dataset settings.


Beyond Satisfaction: From Placebic to Actionable Explanations For Enhanced Understandability

Shymanski, Joe, Brue, Jacob, Sen, Sandip

arXiv.org Artificial Intelligence

Explainable AI (XAI) presents useful tools to facilitate transparency and trustworthiness in machine learning systems. However, current evaluations of system explainability often rely heavily on subjective user surveys, which may not adequately capture the effectiveness of explanations. This paper critiques the overreliance on user satisfaction metrics and explores whether these can differentiate between meaningful (actionable) and vacuous (placebic) explanations. In experiments involving optimal Social Security filing age selection tasks, participants used one of three protocols: no explanations, placebic explanations, and actionable explanations. Participants who received actionable explanations significantly outperformed the other groups in objective measures of their mental model, but users rated placebic and actionable explanations as equally satisfying. This suggests that subjective surveys alone fail to capture whether explanations truly support users in building useful domain understanding. We propose that future evaluations of agent explanation capabilities should integrate objective task performance metrics alongside subjective assessments to more accurately measure explanation quality.


Vector Arithmetic in Concept and Token Subspaces

Feucht, Sheridan, Wallace, Byron, Bau, David

arXiv.org Artificial Intelligence

In order to predict the next token, LLMs must represent semantic and surface-level information about the current word. Previous work identified two types of attention heads that disentangle this information: (i) Concept induction heads, which copy word meanings, and (ii) Token induction heads, which copy literal token representations (Feucht et al., 2025). We show that these heads can be used to identify subspaces of model activations that exhibit coherent semantic structure in Llama-2-7b. Specifically, when we transform hidden states using the attention weights of concept heads, we are able to more accurately perform parallelogram arithmetic (Mikolov et al., 2013) on the resulting hidden states, e.g., showing that "Athens" - "Greece" + "China" = "Beijing". This transformation allows for much higher nearest-neighbor accuracy (80%) than direct use of raw hidden states (47%). Analogously, we show that token heads allow for transformations that reveal surface-level word information in hidden states, allowing for operations like "coding" - "code" + "dance" = "dancing".



Few-Shot Multimodal Medical Imaging: A Theoretical Framework

Mohsin, Md Talha, Abdulrashid, Ismail

arXiv.org Machine Learning

Medical imaging relies heavily on large, labeled datasets. But, unfortunately, they are not always easily accessible in clinical settings. Additionally, many practitioners often face various structural obstacles like limited data availability, fragmented data systems, and unbalanced datasets. These barriers often lead to the increased diagnostic uncertainty, underrepresentation of certain conditions, reduced model robustness, and biased diagnostic decisions. In response to these challenges, approaches such as transfer learning, meta-learning, and multimodal fusion have made great strides. However, they still need a solid theoretical justification for why they succeed or fail in situations where data is scarce. To address this gap, we propose a unified theoretical framework that characterizes learning and inference under low-resource medical imaging conditions. We first formalize the learning objective under few-shot conditions and compute sample complexity constraints to estimate the smallest quantity of data needed to achieve clinically reliable accuracy. Then based on ideas from PAC-learning and PAC-Bayesian theory, we explain how multimodal integration encourages generalization and quantifies uncertainty under sparse supervision. We further propose a formal metric for explanation stability, offering interpretability guarantees under low-data conditions. Taken together, the proposed framework establishes a principled foundation for constructing dependable, data-efficient diagnostic systems by jointly characterizing sample efficiency, uncertainty quantification, and interpretability in a unified theoretical setting.


Sequences of Logits Reveal the Low Rank Structure of Language Models

Golowich, Noah, Liu, Allen, Shetty, Abhishek

arXiv.org Machine Learning

A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts. On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.



Blockchain-Enabled Explainable AI for Trusted Healthcare Systems

Mohsin, Md Talha

arXiv.org Artificial Intelligence

This paper introduces a Blockchain-Integrated Explainable AI Framework (BXHF) for healthcare systems to tackle two essential challenges confronting health information networks: safe data exchange and comprehensible AI-driven clinical decision-making. Our architecture incorporates blockchain, ensuring patient records are immutable, auditable, and tamper-proof, alongside Explainable AI (XAI) methodologies that yield transparent and clinically relevant model predictions. By incorporating security assurances and interpretability requirements into a unified optimization pipeline, BXHF ensures both data-level trust (by verified and encrypted record sharing) and decision-level trust (with auditable and clinically aligned explanations). Its hybrid edge-cloud architecture allows for federated computation across different institutions, enabling collaborative analytics while protecting patient privacy. We demonstrate the framework's applicability through use cases such as cross-border clinical research networks, uncommon illness detection and high-risk intervention decision support. By ensuring transparency, auditability, and regulatory compliance, BXHF improves the credibility, uptake, and effectiveness of AI in healthcare, laying the groundwork for safer and more reliable clinical decision-making.